Empirical Force Fields and their Applications with Biological Macromolecules
Introduction: Biological macromolecules have been of great study in the recent years in the field of Computational Chemistry. These macromolecules consist of smaller organic molecules combined to form longer complex chains.(1) There are four categories of macromolecules; carbohydrates, proteins, lipids, and nucleic acids. Of the four categories, proteins are the “most structurally complex macromolecules known”.(2) For this reason, the primary focus of this paper is on proteins, and more specifically, protein folding. Proteins are considered polymers consisting of a combination of amino acid monomers. Approximately 20 different amino acids are found in the human body, serving a number of purposes. Proteins are formed through the combination of these amino acids through dehydration synthesis and the formation of an amide linkage, or peptide bond.(2) Different combinations of these amino acids result in different chemical properties, physical properties, and biological function exhibited by the protein, making each protein unique. After a protein is formed, it is considered to be in an “unfolded” state. The atoms will then adjust and reach an “intermediate” stage before finally reaching the “native” conformation. This process is known as protein folding has been the focus of many recent studies as scientists have realized the importance of the folding process. Under various conditions, research has shown that aggregates may form instead of the native state.(12) These aggregates have been found to cause many problems including Alzheimer's disease, Mad Cow disease, and even Cystic fibrosis.(3) The p53 protein is associated with preventing cells with damaged DNA from dividing. This protein serves a huge purpose, to prevent cancer. One mutation associated with the p53 protein is a result of not having enough folded in the proper form to prevent cell division. It is clear that protein folding must be studied in further depth to understand the process and potentially how to manipulate it.(3) Theory: Initially, understanding protein folding seems like an intractable problem. Unraveling this problem can start with determining the native state of proteins. Determining these structures experimentally is difficult, which is why the focus has shifted to theoretical methods of determining the conformational structures, which is known as the “protein folding problem.”(4) Conformational analysis is the study of these 3-D conformations and the associated changes in properties.(4) Conformational analysis studies systems that are in equilibrium, which means the system will reach a global minimum of free energy. In terms of proteins, this global minimum is related to the native state of the protein. The macroscopic state of the system at a global minimum contains many individual atoms that adopt different microscopic states. Boltzmann's principle describes the system in this state and relates the energy to the probability density function, p. The use of Boltzmann's statistics requires high temperature limits and is also called the classical limit.(5) The total energy of the N-body system is described as follows: (1) E = Nέ = kT2(dlnQ/dT)N,V = N Σj εj exp(-εj/kT)/q(5) The summation is taken over the energies because the molecules are considered to be independent of one another. The average energy of a particle is then: (2) έ = Σj εj exp(-εj/kT)/q(5) From the average energy equation, the probability that a molecule is in the jth energy state is: (3) πj = exp(-εj/kT)/Σj exp(-εj/kT) = exp(-εj/kT)/q(5) where k and T are Boltzmann's constant and the temperature, respectively, and Q is the partition function: (4) Q(N,V,T) = Σj exp(-Ej/kT), and (5) q(V,T) = Σj exp(-εj/kT). Since the molecules are distinguishable from one another, Q(N,V,T) can be reduced to q(V,T).(5) The goal of statistical mechanics is to calculate the partition function and probability from the energy function. Several problems can arise from attempting to calculate these, the first being the design of an energy function to model the system accurately and then being able to calculate the partition function by analytical or numerical techniques.(6) Accuracy of the calculations is determined by comparing the simulated data to experimental results. It is important to understand how these computational simulations are run before examining their results, which brings the discussion to empirical force fields. Computer modeling can serve as a very useful and fairly accurate technique when it comes to smaller systems and molecules. The issue is when discussing large molecules of one hundred atoms or more where the atoms are highly connected and quantum mechanics impose restrictions on movement of degrees of freedom. Modeling such a system with complete accuracy using quantum mechanical considerations could take a lifetime. For this reason, several simplifications can be used to still produce fairly accurate results. The first approximation is the Born-Oppenheimer approximation. This approximation takes into account the size difference between the nuclei and electrons, and therefore the nuclei move much slower than the electrons. The electrons are considered to “move in a field produced by the nuclei fixed at some internuclear separation,” and only nuclear motion is considered.(5) The second approximation is to connect the atoms by “springs” and treat them as harmonic oscillators. The third approximation is to derive parameters for small systems and apply them to the larger system of interest. All of this information will be contained in what is called a force field. A simulation can be run by applying all of the constraints to realistically model the system and calculating the forces on all atoms by moving the system as a collection of connected atoms with restrictions. These few simplifications are usually transferred throughout most force fields, although some variations may be made.(19) Two approaches may be taken when developing a force field. The first being through quantum mechanical calculations, as mentioned previously, by deriving this information for simple systems and using thermodynamic information along with spectroscopic data to expand this to the larger systems. Force fields developed using this method are known as semi-empirical. Another method is through using what is known already as the basis of the force field. Obtaining the forces of the large molecular systems already known and using the potentials to describe the native state for unknown systems is known as the empirical method.(6)(18) The most important part of a force field is the potential energy function that is used to describe how the energy of the system changes as the structure changes. One of the simplest potential energy functions is known as a “Class I additive”.(7) (6) U(R) = Σbonds Kb(b-bo)2 + Σangles Kθ(θ-θo)2 +Σdihedral * Kx(1+cos(nχ-δ)) + Σangles Kimp(φ-φo)2+ Σnonbond ( εij[(R minij/rij)12 – (R minij/rij)6]) + qiqj/ εjrij here b is the bond length, θ is the valence angle, χ is the dihedral angle, φ is the improper angle, and rij is the distance between atoms i and j. This function can be adjusted to represent the actual system of interest by adjusting the parameters of the equation. The parameters representing the intramolecular interactions include the following; the bond force constant, Kb, equilibrium distance, bo, the valence angle force constant, Kθ, equilibrium angle, θo, dihedral force constant, Kx, multiplicity, n, the phase angle, δ, the improper force constant, Kimp, and the equilibrium improper angle, φo.(7) Those representing the non-bonded interactions include the partial atomic charges, q, the Lennard-Jones well-depth, Eij, and minimum interaction radius, R minij. These potential energy functions are comprised of a combination of functions used to describe bonds, angles, distortions, and interactions between atoms. The model shown in Equation (6) treats all of the interactions between atoms with a Lennard-Jones equation that represents repulsion and dispersion. Electrostatics are treated with the last term in the equation. Bond, angles, and dihedral angles are also modeled in this potential function.This function is commonly used in CHARMM, AMBER, GROMOS, and OPLS, which are examples of biomolecular force fields that are used throughout research of biological systems.(7) Such additive force fields have been shown to model hydrogen bonding reasonably accurately with the combination of the Coulombic and Lennard-Jones terms of the additive equation. This simple form shown in Equation (6) can be altered to accurately model the system of interest. Some altercations include those to adjust for electronic polarizability, combining rules, 1,4 interactions, lone pairs, all-atom vs. united atom, treatment of solvation, and treatment of long-range interactions. Electronic polarizability occurs whenever the electron density shifts in response to an applied electric field. Methods to treat electronic polarizability can be added as an extension of Equation (6) and are used for the treatment of non-bonded interactions in biomolecular force fields. The energy associated with the polarization: (7) Upol = ½ ΣiμiEi where μi is the dipole moment and Ei is the electrostatic field that atom i experiences. The common methods to treat polarizability are through induced dipole treatment, fluctuating charge models, or a combination of both.(7) The induced dipole method uses a dipole field tensor, Tij, to model the interaction of a dipole moment on atom j and an electrostatic field due to the static charges, Eio: (8) μi = αi[ Eio - Σi Tijμj] (7) The fluctuating charge model allows the partial atomic charges to redistribute and reach equivalent electronegativity on each atom. This results in a change in the overall molecular dipole moment instead of individual dipole moments of each atom: (9) Eelec = Σi (χiqi + ½ Jiqi2) where each atom is assigned an electronegativity, χ, and hardness, J. Most of the work performed with polarization has been proven to model small systems accurately, but still struggles with treatment of large biomolecular systems. One assumed possibility for this inaccuracy comes from the inability to “transfer the gas phase molecular polarizabilities to the condensed phase.” (7) Treatment of 1,4 interactions must also be added to the force field, since only 1,2 and 1,3 non-bonded interactions are involved with the internal parameters. All-atom force fields treat all of the atoms in the molecule explicitly. However, united atom force fields neglect the hydrogen atoms for added simplifications. These adjustments can be made based on the system of interest, since hydrogen interactions should be considered for polar hydrogens. For biomolecular systems, accurate modeling of a condensed aqueous enviroment is important, so treatment is handled using explicit or implicit models. Explicit models treat the system more accurately, but can involve more computer time. Some force fields using explicit treatment include TIP3P, TIP4P, SPC, extended SPC/E, and F3C.(7) Implicit models allow for accurate results while saving computer time. The major contributor to accuracy of the implicit models comes from use of the Poisson-Boltzmann model. This equation is derived from Poisson's equation: (10) ∇2φ =−4πρ/ε (5) relating the potential to the charge density of the atoms. The Debye Huckle Theory approximates the potential of the mean as follows: (11) w1s(r ) = qs1 <ψ(r)>= qsφ1(r) r > a (5) keeping the radii of all the ions be at a/2. Substituting Equation (11) into Equation (10), we arrive at the Poisson-Boltzmann equation: (12) ∇2φ1(r) = -4π/ε Σs csqs exp(-β qsφ1(r)) r > a (5) This equation can then be used to calculate the electrostatic free energy of the system. Since this equation is non-linear, another approximation made by Debye-Huckle Theory: (13) Σs csqs exp(-β qsφ1(r)) ≈ Σs csqs – β Σs csqs2φ1(r) (5) linearizes the right hand side. The only issue with the Poisson-Boltzmann model is that it can be computationally expensive. For this reason, a generalized-Born model for solvation can be used. Treatment of long-range interactions can be the most computationally expensive aspect of the simulation. This can be handled by truncating the electrostatic and Lennard-Jones intractions at specific distances. It is now clear that many variations of the potential energy functions can be made to accurately model the system of interest. The potential energy function does not alone make up the force field. A force field is not complete without the parameters being set. (7) The quality of the force field is determined significantly by the selected parameters. It is possible to obtain these parameters through optimization of the force field. Optimization of the force field for proteins usually is perfromed by determining the parameters through quantum mechanical results for amino acids that represent the overall protein.(9) This data can be obtained on the geometries, optimizing the bond constants, valence angle equilibrium constants, dihedral multiplicity constants, and phase constants. Obtaining this data as a function of rotation about bonds will optimize the dihedral parameters. Some issues may arise as these quantum mechanical calculations are not entirely accurate for consended phase, but rather gas phase. For this reason, “the capability to carefully examine geometric, vibrational, and conformational properties, allowing for quantification of condensed phase contributions, is limited to systems for which extensive experimental data is available.”(7) This means to accurately model protein interactions, much experimental data must be obtained first before proper optimization can occur. Protein force fields utilize both united-atom and all-atom methods. All-atom protein models are used specifically with OPLS/AA, CHARMM22, and AMBER (PARM99) force fields, but these exclude protein-folding studies. The AMBER force field stands for Assisted Model Building and Energy Refinement and was developed for use with nucleic acids and proteins. This force field was calibrated against experimental bond lengths and angles from microwave, neutron diffratction, and accurate quantum chemical studies. OPLS force fields are also optimized for amino acids and proteins.(4) The OPLS/AA used simple Coulomb and Lennard-Jones interactions as show with Equation (6). This method is also compatible with the TIP4P, TIP3P and SPC water force fields.(4) The majority of the force fields utilized for studying the protein-folding problem are knowledge-based force fields. These are parameterized to yield free energies directly, whereas previously discussed methods use thermodynamic quantities obtained from statistical mechanics calculations using the potential energy functions.(7) The modeling of the protein-folding problem can be approached in two manners. The first is through constructing an energy function whose “global minima along lines of constant squence correcspond to the native state(s) of amino acid sequences.”(6)(15) The global minima can then be located alone the line of constant sequence. This first type is called the simple lattice model. The second is to use an inverse folding technique known as an atomistic model. This method utilizes the few number of known structures and attempts to identify the sequences in this data base to fit into a known, desired fold. “The sequence structure space is explored along lines of constant conformation.”(6) The only problem that arised using the inverse folding method is that the global minima may not lie along the lines of constant conformation. A combination of these two methods have lead to a new view of the protein-folding problem. Studies:
Many studies have been performed with multiple takes on how to adjust the force fields to depict protein-folding. One group from the University of Maryland believes the accurate modeling of the protein backbone has been a limiting factor in successfully tackling the proteing-folding problem. This problem arises with improper treatment of the dihedral terms. This study utilized the CHARMM potential energy function and attempted to manipulate it to accurately treat the protein backbone dihedral conformational energies. Their inital attempts involved introduction of φ/ψ dihedral cross terms, but did not lead to correct reproductiong of the energy differences. Another attempt was made by introducing a grid-based energy correction to the dihedral 2D surface.(6) A similar study was performed at the University of Florida utilizing the AMBER force field. It was noted that the AMBER force field also lacks the correct backbone dihedral parameters. The study found that the different nature of glycine compared to the other amino acids requires two sets of dihedral parameters with the first being optimized solely for glycine. The resulting set of parameters was proven to result in better agreement with the experimental data for short glycine and alanine peptides in water. The simulation was also run for larger systems compared to the unmodified force fields and results showed that the newly derived parameters performed the best when comparing calculated relaxation order parameters to NMR values.(8)(20) Another study used Monte Carlo simulations to examine several hundred sequences. A heteropolymer with a random sequence of monomers of many different types was modeled at random utilizing a Gaussian distribution.(4) The study found that some sequences found the global energy minimum in a short number of steps, while others were unable to find the energy minimum. As these results were further analyzed, it was found that those sequences that were able to reach the global minima had a smaller energy separation between local minimum states. Those that did not minimize were a result of too large of a gap between the next lowest energy state.(4) The suggested protein folding pathway from this experiment was a three-stage pathway. The first being a collapse to a “semi-compact random globule” followed by the second rate-limiting transition state step, in which the protein searches for the native state. The final stage is when the protein progresses rapidly from the transition state to the native state.(4) Atomistic simulations have also been run under high-temperature molecular dynamic conditions. These resuls are not always directly relevant to the folding mechanism, although other denaturing conditions are capable of being modeled. The desired goal is to acheive a full atomistic simulation for the protein-folding process that utilizes an explicit representation of the solvent. The issue is in the length of the simulation. A study was performed on a 36-residue peptide starting in an unfolded state. The simulation was run for 1 microsecond in a truncated octahedron simulation box with approximately 3000 water molecules with a time step of 2 femtoseconds. This “simple” model was completed after about four months on a “256 massively parallel supercomputer”.(4) The protein did not fold completely, as it is expected to take between approximately 10-100 microseconds. The simulation was monitored with RMS deviation to the experimental structure, the radius of gyration, the fraction of native contacts present, and the solvation of free energy.(4) “As computer power increases we are likely to see more studies of this type.”(4)
Many adjustments can be made when running these computations. A group from California Technical Institute was able to utilize a stochastic model Hamiltonian to study a complex chemical reaction such as protein folding. They still describe many ways to improve on their model, including accounting for correlations in energies of protein states and better dynamics near the glass transition temperature.(10) Another group from France modeled the folding process as a two-stage model. They made stage 1 the formations of the alpha-helices across the lipid bilayer and stage 2 involves their interactions to form transmembrane structures.(11) Other models of this issue have applications across many fields. A group from University of Beijing completed work on high concentrations of proteins, which applies to conditions in neurodegenerative diseases.(14) Conclusion: The development of force fields is a complex process involving potential energy functions, optimized parameters, and many approximations. These approximations not only allow for simplification of the algorithms and are directly correlated to length of the simulation, but also result in fairly accurate results. In the case of modeling condensed-phase systems such as proteins in solvents, the gas-phase quantum mechanical calculations may not always fit accurately. It is important to not use force fields beyond their range of validity. For this reason, many different force fields have been developed including the few that were discussed; OPLS/AA, CHARMM22, and AMBER (PARM99). It is not uncommon to see many research groups manipulating these force fields, as observed with the few examples provided, to model their specialized systems of interest. Determining accurate methods for modeling protein-folding could serve a significant purpose with treatment of diseases including Alzheimers, Cystic fibrosis, and even cancer. Even though many methods have been tested, all of them result in some problems. No universal solution is present to date, although more successful models have been attempted. References:
(1) Moore, J. W., Stanitski, C. L., Jurs, P. C.. Chemistry: The Molecular Science. 4th ed. Belmont, Ca: Brooks/Cole; 2011
(2) Carter, J. S. Amino Acids and Protiens. University of Cincinnati. [2004, Nov., 02; 2011, Mar., 18] .Available from: http://biology.clc.uc.edu/courses/bio104/protein.htm
(3) Thomasson, W. A. Unraveling the Mystery of Protein Folding. Breakthroughs in Bioscience. [2011, March, 18]. Available from: http://www.faseb.org/portals/0/pdfs/opa/protfold.pdf
(4) Leach, A. R. Molecular Modeling: Principles and Applications. 2nd ed. Pearson Education EMA; 2001
(5) McQuarrie, D. A. Statistical Mechanics. New York: Harper & Row; 1973
(6) Sippl, M. (1993), Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J. Computer-Aided Molecular Design, 4: 473-501. http://dx.doi.org/10.1007/BF02337562
(7) Mackerell, A. D. (2004), Empirical force fields for biological macromolecules: Overview and issues. J. Comput. Chem., 25:1584–1604. http://dx.doi.org/10.1002/jcc.20082
(8) Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A. and Simmerling, C. (2006), Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins, 65: 712–725. http://dx.doi.org/10.1002/prot.21123
(9) Dill, K. A. (1990). Perspectives in Biochemistry, 29(31). http://dx.doi.org/10.1021/bi00483a001
(10) Bryngelson, J. D., & Wolynes, P. G. (1989). Intermediates and Barrier Crossing in a Random Energy Model (with Applications to Protein Folding), (10), 6902–6915. http://dx.doi.org/10.1021/j100356a007
(11) France, C. De, Biophysics, M., Avenue, W., Haven, N., & October, R. (1990). Perspectives in Biochemistry, 29(17). http://dx.doi.org/10.1021/bi00469a001
(12) 1991 Nature Publishing Group http://www.nature.com/naturebiotechnology. (1991). http://dx.doi.org/10.1038/nbt0991-825
(13) Fodor, L. (2012). Book reviews. Acta veterinaria Hungarica, 60(4), 529–31. http://dx.doi.org/10.1556/AVet.2012.047
(14) Lu, D., & Liu, Z. (2008). Oscillatory molecular driving force for protein folding at high concentration: a molecular simulation. The journal of physical chemistry. B, 112(9), 2686–93. http://dx.doi.org/10.1021/jp076940o
(15) Gromiha, M. M. (2005). A statistical model for predicting protein folding rates from amino acid sequence with structural class information. Journal of chemical information and modeling, 45(2), 494–501. http://dx.doi.org/10.1021/ci049757q
(16) Pascher, T., Chesick, J. P., Winkler, J. R., & Gray, H. B. (1996). Protein folding triggered by electron transfer. Science (New York, N.Y.), 271(5255), 1558–60. http://dx.doi.org/10.1021/ar970078t
(17) Brooks, C. L. (2002). Protein and peptide folding explored with molecular simulations. Accounts of chemical research, 35(6), 447–54. http://dx.doi.org/10.1021/ar0100172
(18) Bai, Y. (2006). Protein folding pathways studied by pulsed- and native-state hydrogen exchange. Chemical reviews, 106(5), 1757–68. http://dx.doi.org/10.1021/cr040432i
(19) Naganathan, A. N., Doshi, U., Fung, A., Sadqi, M., & Muñoz, V. (2006). Dynamics, energetics, and structure in protein folding. Biochemistry, 45(28), 8466–75. http://dx.doi.org/10.1021/bi060643c
(20) Srivastava, K. R., Kumar, A., Goyal, B., & Durani, S. (2011). Stereochemistry and solvent role in protein folding: nuclear magnetic resonance and molecular dynamics studies of poly-L and alternating-L,D homopolypeptides in dimethyl sulfoxide. The journal of physical chemistry. B, 115(20), 6700–8. http://dx.doi.org/10.1021/jp200743w
Introduction:
Biological macromolecules have been of great study in the recent years in the field of Computational Chemistry. These macromolecules consist of smaller organic molecules combined to form longer complex chains.(1) There are four categories of macromolecules; carbohydrates, proteins, lipids, and nucleic acids. Of the four categories, proteins are the “most structurally complex macromolecules known”.(2) For this reason, the primary focus of this paper is on proteins, and more specifically, protein folding. Proteins are considered polymers consisting of a combination of amino acid monomers. Approximately 20 different amino acids are found in the human body, serving a number of purposes. Proteins are formed through the combination of these amino acids through dehydration synthesis and the formation of an amide linkage, or peptide bond.(2)
Different combinations of these amino acids result in different chemical properties, physical properties, and biological function exhibited by the protein, making each protein unique. After a protein is formed, it is considered to be in an “unfolded” state. The atoms will then adjust and reach an “intermediate” stage before finally reaching the “native” conformation. This process is known as protein folding has been the focus of many recent studies as scientists have realized the importance of the folding process. Under various conditions, research has shown that aggregates may form instead of the native state.(12) These aggregates have been found to cause many problems including Alzheimer's disease, Mad Cow disease, and even Cystic fibrosis.(3) The p53 protein is associated with preventing cells with damaged DNA from dividing. This protein serves a huge purpose, to prevent cancer. One mutation associated with the p53 protein is a result of not having enough folded in the proper form to prevent cell division. It is clear that protein folding must be studied in further depth to understand the process and potentially how to manipulate it.(3)
Theory:
Initially, understanding protein folding seems like an intractable problem. Unraveling this problem can start with determining the native state of proteins. Determining these structures experimentally is difficult, which is why the focus has shifted to theoretical methods of determining the conformational structures, which is known as the “protein folding problem.”(4) Conformational analysis is the study of these 3-D conformations and the associated changes in properties.(4) Conformational analysis studies systems that are in equilibrium, which means the system will reach a global minimum of free energy. In terms of proteins, this global minimum is related to the native state of the protein. The macroscopic state of the system at a global minimum contains many individual atoms that adopt different microscopic states. Boltzmann's principle describes the system in this state and relates the energy to the probability density function, p. The use of Boltzmann's statistics requires high temperature limits and is also called the classical limit.(5) The total energy of the N-body system is described as follows:
(1) E = Nέ = kT2(dlnQ/dT)N,V = N Σj εj exp(-εj/kT)/q(5)
The summation is taken over the energies because the molecules are considered to be independent of one another. The average energy of a particle is then:
(2) έ = Σj εj exp(-εj/kT)/q(5)
From the average energy equation, the probability that a molecule is in the jth energy state is:
(3) πj = exp(-εj/kT)/Σj exp(-εj/kT) = exp(-εj/kT)/q(5)
where k and T are Boltzmann's constant and the temperature, respectively, and Q is the partition function:
(4) Q(N,V,T) = Σj exp(-Ej/kT), and
(5) q(V,T) = Σj exp(-εj/kT).
Since the molecules are distinguishable from one another, Q(N,V,T) can be reduced to q(V,T).(5) The goal of statistical mechanics is to calculate the partition function and probability from the energy function. Several problems can arise from attempting to calculate these, the first being the design of an energy function to model the system accurately and then being able to calculate the partition function by analytical or numerical techniques.(6) Accuracy of the calculations is determined by comparing the simulated data to experimental results. It is important to understand how these computational simulations are run before examining their results, which brings the discussion to empirical force fields.
Computer modeling can serve as a very useful and fairly accurate technique when it comes to smaller systems and molecules. The issue is when discussing large molecules of one hundred atoms or more where the atoms are highly connected and quantum mechanics impose restrictions on movement of degrees of freedom. Modeling such a system with complete accuracy using quantum mechanical considerations could take a lifetime. For this reason, several simplifications can be used to still produce fairly accurate results. The first approximation is the Born-Oppenheimer approximation. This approximation takes into account the size difference between the nuclei and electrons, and therefore the nuclei move much slower than the electrons. The electrons are considered to “move in a field produced by the nuclei fixed at some internuclear separation,” and only nuclear motion is considered.(5) The second approximation is to connect the atoms by “springs” and treat them as harmonic oscillators. The third approximation is to derive parameters for small systems and apply them to the larger system of interest. All of this information will be contained in what is called a force field. A simulation can be run by applying all of the constraints to realistically model the system and calculating the forces on all atoms by moving the system as a collection of connected atoms with restrictions. These few simplifications are usually transferred throughout most force fields, although some variations may be made.(19)
Two approaches may be taken when developing a force field. The first being through quantum mechanical calculations, as mentioned previously, by deriving this information for simple systems and using thermodynamic information along with spectroscopic data to expand this to the larger systems. Force fields developed using this method are known as semi-empirical. Another method is through using what is known already as the basis of the force field. Obtaining the forces of the large molecular systems already known and using the potentials to describe the native state for unknown systems is known as the empirical method.(6)(18) The most important part of a force field is the potential energy function that is used to describe how the energy of the system changes as the structure changes. One of the simplest potential energy functions is known as a “Class I additive”.(7)
(6) U(R) = Σbonds Kb(b-bo)2 + Σangles Kθ(θ-θo)2 +Σdihedral * Kx(1+cos(nχ-δ)) + Σangles Kimp(φ-φo)2+ Σnonbond ( εij[(R minij/rij)12 – (R minij/rij)6]) + qiqj/ εjrij
here b is the bond length, θ is the valence angle, χ is the dihedral angle, φ is the improper angle, and rij is the distance between atoms i and j.
This function can be adjusted to represent the actual system of interest by adjusting the parameters of the equation. The parameters representing the intramolecular interactions include the following; the bond force constant, Kb, equilibrium distance, bo, the valence angle force constant, Kθ, equilibrium angle, θo, dihedral force constant, Kx, multiplicity, n, the phase angle, δ, the improper force constant, Kimp, and the equilibrium improper angle, φo.(7) Those representing the non-bonded interactions include the partial atomic charges, q, the Lennard-Jones well-depth, Eij, and minimum interaction radius, R minij. These potential energy functions are comprised of a combination of functions used to describe bonds, angles, distortions, and interactions between atoms. The model shown in Equation (6) treats all of the interactions between atoms with a Lennard-Jones equation that represents repulsion and dispersion. Electrostatics are treated with the last term in the equation. Bond, angles, and dihedral angles are also modeled in this potential function.This function is commonly used in CHARMM, AMBER, GROMOS, and OPLS, which are examples of biomolecular force fields that are used throughout research of biological systems.(7) Such additive force fields have been shown to model hydrogen bonding reasonably accurately with the combination of the Coulombic and Lennard-Jones terms of the additive equation.
This simple form shown in Equation (6) can be altered to accurately model the system of interest. Some altercations include those to adjust for electronic polarizability, combining rules, 1,4 interactions, lone pairs, all-atom vs. united atom, treatment of solvation, and treatment of long-range interactions. Electronic polarizability occurs whenever the electron density shifts in response to an applied electric field. Methods to treat electronic polarizability can be added as an extension of Equation (6) and are used for the treatment of non-bonded interactions in biomolecular force fields. The energy associated with the polarization:
(7) Upol = ½ ΣiμiEi
where μi is the dipole moment and Ei is the electrostatic field that atom i experiences. The common methods to treat polarizability are through induced dipole treatment, fluctuating charge models, or a combination of both.(7) The induced dipole method uses a dipole field tensor, Tij, to model the interaction of a dipole moment on atom j and an electrostatic field due to the static charges, Eio:
(8) μi = αi[ Eio - Σi Tijμj] (7)
The fluctuating charge model allows the partial atomic charges to redistribute and reach equivalent electronegativity on each atom. This results in a change in the overall molecular dipole moment instead of individual dipole moments of each atom:
(9) Eelec = Σi (χiqi + ½ Jiqi2)
where each atom is assigned an electronegativity, χ, and hardness, J. Most of the work performed with polarization has been proven to model small systems accurately, but still struggles with treatment of large biomolecular systems. One assumed possibility for this inaccuracy comes from the inability to “transfer the gas phase molecular polarizabilities to the condensed phase.” (7)
Treatment of 1,4 interactions must also be added to the force field, since only 1,2 and 1,3 non-bonded interactions are involved with the internal parameters. All-atom force fields treat all of the atoms in the molecule explicitly. However, united atom force fields neglect the hydrogen atoms for added simplifications. These adjustments can be made based on the system of interest, since hydrogen interactions should be considered for polar hydrogens. For biomolecular systems, accurate modeling of a condensed aqueous enviroment is important, so treatment is handled using explicit or implicit models. Explicit models treat the system more accurately, but can involve more computer time. Some force fields using explicit treatment include TIP3P, TIP4P, SPC, extended SPC/E, and F3C.(7) Implicit models allow for accurate results while saving computer time. The major contributor to accuracy of the implicit models comes from use of the Poisson-Boltzmann model. This equation is derived from Poisson's equation:
(10) ∇2φ =−4πρ/ε (5)
relating the potential to the charge density of the atoms. The Debye Huckle Theory approximates the potential of the mean as follows:
(11) w1s(r ) = qs1 <ψ(r)>= qsφ1(r) r > a (5)
keeping the radii of all the ions be at a/2. Substituting Equation (11) into Equation (10), we arrive at the Poisson-Boltzmann equation:
(12) ∇2φ1(r) = -4π/ε Σs csqs exp(-β qsφ1(r)) r > a (5)
This equation can then be used to calculate the electrostatic free energy of the system. Since this equation is non-linear, another approximation made by Debye-Huckle Theory:
(13) Σs csqs exp(-β qsφ1(r)) ≈ Σs csqs – β Σs csqs2φ1(r) (5)
linearizes the right hand side. The only issue with the Poisson-Boltzmann model is that it can be computationally expensive. For this reason, a generalized-Born model for solvation can be used.
Treatment of long-range interactions can be the most computationally expensive aspect of the simulation. This can be handled by truncating the electrostatic and Lennard-Jones intractions at specific distances. It is now clear that many variations of the potential energy functions can be made to accurately model the system of interest. The potential energy function does not alone make up the force field. A force field is not complete without the parameters being set. (7)
The quality of the force field is determined significantly by the selected parameters. It is possible to obtain these parameters through optimization of the force field. Optimization of the force field for proteins usually is perfromed by determining the parameters through quantum mechanical results for amino acids that represent the overall protein.(9) This data can be obtained on the geometries, optimizing the bond constants, valence angle equilibrium constants, dihedral multiplicity constants, and phase constants. Obtaining this data as a function of rotation about bonds will optimize the dihedral parameters. Some issues may arise as these quantum mechanical calculations are not entirely accurate for consended phase, but rather gas phase. For this reason, “the capability to carefully examine geometric, vibrational, and conformational properties, allowing for quantification of condensed phase contributions, is limited to systems for which extensive experimental data is available.”(7) This means to accurately model protein interactions, much experimental data must be obtained first before proper optimization can occur.
Protein force fields utilize both united-atom and all-atom methods. All-atom protein models are used specifically with OPLS/AA, CHARMM22, and AMBER (PARM99) force fields, but these exclude protein-folding studies. The AMBER force field stands for Assisted Model Building and Energy Refinement and was developed for use with nucleic acids and proteins. This force field was calibrated against experimental bond lengths and angles from microwave, neutron diffratction, and accurate quantum chemical studies. OPLS force fields are also optimized for amino acids and proteins.(4) The OPLS/AA used simple Coulomb and Lennard-Jones interactions as show with Equation (6). This method is also compatible with the TIP4P, TIP3P and SPC water force fields.(4) The majority of the force fields utilized for studying the protein-folding problem are knowledge-based force fields. These are parameterized to yield free energies directly, whereas previously discussed methods use thermodynamic quantities obtained from statistical mechanics calculations using the potential energy functions.(7) The modeling of the protein-folding problem can be approached in two manners. The first is through constructing an energy function whose “global minima along lines of constant squence correcspond to the native state(s) of amino acid sequences.”(6)(15) The global minima can then be located alone the line of constant sequence. This first type is called the simple lattice model. The second is to use an inverse folding technique known as an atomistic model. This method utilizes the few number of known structures and attempts to identify the sequences in this data base to fit into a known, desired fold. “The sequence structure space is explored along lines of constant conformation.”(6) The only problem that arised using the inverse folding method is that the global minima may not lie along the lines of constant conformation. A combination of these two methods have lead to a new view of the protein-folding problem.
Studies:
Many studies have been performed with multiple takes on how to adjust the force fields to depict protein-folding. One group from the University of Maryland believes the accurate modeling of the protein backbone has been a limiting factor in successfully tackling the proteing-folding problem. This problem arises with improper treatment of the dihedral terms. This study utilized the CHARMM potential energy function and attempted to manipulate it to accurately treat the protein backbone dihedral conformational energies. Their inital attempts involved introduction of φ/ψ dihedral cross terms, but did not lead to correct reproductiong of the energy differences. Another attempt was made by introducing a grid-based energy correction to the dihedral 2D surface.(6) A similar study was performed at the University of Florida utilizing the AMBER force field. It was noted that the AMBER force field also lacks the correct backbone dihedral parameters. The study found that the different nature of glycine compared to the other amino acids requires two sets of dihedral parameters with the first being optimized solely for glycine. The resulting set of parameters was proven to result in better agreement with the experimental data for short glycine and alanine peptides in water. The simulation was also run for larger systems compared to the unmodified force fields and results showed that the newly derived parameters performed the best when comparing calculated relaxation order parameters to NMR values.(8)(20)
Another study used Monte Carlo simulations to examine several hundred sequences. A heteropolymer with a random sequence of monomers of many different types was modeled at random utilizing a Gaussian distribution.(4) The study found that some sequences found the global energy minimum in a short number of steps, while others were unable to find the energy minimum. As these results were further analyzed, it was found that those sequences that were able to reach the global minima had a smaller energy separation between local minimum states. Those that did not minimize were a result of too large of a gap between the next lowest energy state.(4) The suggested protein folding pathway from this experiment was a three-stage pathway. The first being a collapse to a “semi-compact random globule” followed by the second rate-limiting transition state step, in which the protein searches for the native state. The final stage is when the protein progresses rapidly from the transition state to the native state.(4)
Atomistic simulations have also been run under high-temperature molecular dynamic conditions. These resuls are not always directly relevant to the folding mechanism, although other denaturing conditions are capable of being modeled. The desired goal is to acheive a full atomistic simulation for the protein-folding process that utilizes an explicit representation of the solvent. The issue is in the length of the simulation. A study was performed on a 36-residue peptide starting in an unfolded state. The simulation was run for 1 microsecond in a truncated octahedron simulation box with approximately 3000 water molecules with a time step of 2 femtoseconds. This “simple” model was completed after about four months on a “256 massively parallel supercomputer”.(4) The protein did not fold completely, as it is expected to take between approximately 10-100 microseconds. The simulation was monitored with RMS deviation to the experimental structure, the radius of gyration, the fraction of native contacts present, and the solvation of free energy.(4) “As computer power increases we are likely to see more studies of this type.”(4)
Many adjustments can be made when running these computations. A group from California Technical Institute was able to utilize a stochastic model Hamiltonian to study a complex chemical reaction such as protein folding. They still describe many ways to improve on their model, including accounting for correlations in energies of protein states and better dynamics near the glass transition temperature.(10) Another group from France modeled the folding process as a two-stage model. They made stage 1 the formations of the alpha-helices across the lipid bilayer and stage 2 involves their interactions to form transmembrane structures.(11) Other models of this issue have applications across many fields. A group from University of Beijing completed work on high concentrations of proteins, which applies to conditions in neurodegenerative diseases.(14)
Conclusion:
The development of force fields is a complex process involving potential energy functions, optimized parameters, and many approximations. These approximations not only allow for simplification of the algorithms and are directly correlated to length of the simulation, but also result in fairly accurate results. In the case of modeling condensed-phase systems such as proteins in solvents, the gas-phase quantum mechanical calculations may not always fit accurately. It is important to not use force fields beyond their range of validity. For this reason, many different force fields have been developed including the few that were discussed; OPLS/AA, CHARMM22, and AMBER (PARM99). It is not uncommon to see many research groups manipulating these force fields, as observed with the few examples provided, to model their specialized systems of interest. Determining accurate methods for modeling protein-folding could serve a significant purpose with treatment of diseases including Alzheimers, Cystic fibrosis, and even cancer. Even though many methods have been tested, all of them result in some problems. No universal solution is present to date, although more successful models have been attempted.
References:
(1) Moore, J. W., Stanitski, C. L., Jurs, P. C.. Chemistry: The Molecular Science. 4th ed. Belmont, Ca: Brooks/Cole; 2011
(2) Carter, J. S. Amino Acids and Protiens. University of Cincinnati. [2004, Nov., 02; 2011, Mar., 18] .Available from: http://biology.clc.uc.edu/courses/bio104/protein.htm
(3) Thomasson, W. A. Unraveling the Mystery of Protein Folding. Breakthroughs in Bioscience. [2011, March, 18]. Available from: http://www.faseb.org/portals/0/pdfs/opa/protfold.pdf
(4) Leach, A. R. Molecular Modeling: Principles and Applications. 2nd ed. Pearson Education EMA; 2001
(5) McQuarrie, D. A. Statistical Mechanics. New York: Harper & Row; 1973
(6) Sippl, M. (1993), Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J. Computer-Aided Molecular Design, 4: 473-501. http://dx.doi.org/10.1007/BF02337562
(7) Mackerell, A. D. (2004), Empirical force fields for biological macromolecules: Overview and issues. J. Comput. Chem., 25:1584–1604. http://dx.doi.org/10.1002/jcc.20082
(8) Hornak, V., Abel, R., Okur, A., Strockbine, B., Roitberg, A. and Simmerling, C. (2006), Comparison of multiple Amber force fields and development of improved protein backbone parameters. Proteins, 65: 712–725. http://dx.doi.org/10.1002/prot.21123
(9) Dill, K. A. (1990). Perspectives in Biochemistry, 29(31). http://dx.doi.org/10.1021/bi00483a001
(10) Bryngelson, J. D., & Wolynes, P. G. (1989). Intermediates and Barrier Crossing in a Random Energy Model (with Applications to Protein Folding), (10), 6902–6915. http://dx.doi.org/10.1021/j100356a007
(11) France, C. De, Biophysics, M., Avenue, W., Haven, N., & October, R. (1990). Perspectives in Biochemistry, 29(17). http://dx.doi.org/10.1021/bi00469a001
(12) 1991 Nature Publishing Group http://www.nature.com/naturebiotechnology. (1991). http://dx.doi.org/10.1038/nbt0991-825
(13) Fodor, L. (2012). Book reviews. Acta veterinaria Hungarica, 60(4), 529–31. http://dx.doi.org/10.1556/AVet.2012.047
(14) Lu, D., & Liu, Z. (2008). Oscillatory molecular driving force for protein folding at high concentration: a molecular simulation. The journal of physical chemistry. B, 112(9), 2686–93. http://dx.doi.org/10.1021/jp076940o
(15) Gromiha, M. M. (2005). A statistical model for predicting protein folding rates from amino acid sequence with structural class information. Journal of chemical information and modeling, 45(2), 494–501. http://dx.doi.org/10.1021/ci049757q
(16) Pascher, T., Chesick, J. P., Winkler, J. R., & Gray, H. B. (1996). Protein folding triggered by electron transfer. Science (New York, N.Y.), 271(5255), 1558–60. http://dx.doi.org/10.1021/ar970078t
(17) Brooks, C. L. (2002). Protein and peptide folding explored with molecular simulations. Accounts of chemical research, 35(6), 447–54.
http://dx.doi.org/10.1021/ar0100172
(18) Bai, Y. (2006). Protein folding pathways studied by pulsed- and native-state hydrogen exchange. Chemical reviews, 106(5), 1757–68.
http://dx.doi.org/10.1021/cr040432i
(19) Naganathan, A. N., Doshi, U., Fung, A., Sadqi, M., & Muñoz, V. (2006). Dynamics, energetics, and structure in protein folding. Biochemistry, 45(28), 8466–75.
http://dx.doi.org/10.1021/bi060643c
(20) Srivastava, K. R., Kumar, A., Goyal, B., & Durani, S. (2011). Stereochemistry and solvent role in protein folding: nuclear magnetic resonance and molecular dynamics studies of poly-L and alternating-L,D homopolypeptides in dimethyl sulfoxide. The journal of physical chemistry. B, 115(20), 6700–8. http://dx.doi.org/10.1021/jp200743w